Speculative Checkpointing
نویسندگان
چکیده
In large scale parallel systems, storing memory images with checkpointing will involve massive amounts of concentrated I/O from many nodes, resulting in considerable execution overhead. For user-level checkpointing, overhead reduction usually involves both spatial, i.e., reducing the amount of checkpoint data, and temporal, i.e., spreading out I/O by checkpointing data as soon as their values become fixed. However, for system-level checkpointing, while being generic and effortless for the end-user, most efforts have focused on simple methods for spatial reductions only. Instead, we propose speculative checkpointing, which is an attempt to exploit temporal reduction in system-level checkpointing. We demonstrate that speculative checkpointing can be implemented as a simple extension of incremental checkpointing, a well-known checkpointing optimization algorithm for spatial reduction. Although shown to be useful and effective, the overall effectiveness of speculative checkpointing is greatly affected by the last-write heuristics of pages, and as such it is difficult to determine the theoretical upper bound of the effectiveness of speculative checkpointing in practical applications. In order to analyze this, we construct a checkpointing
منابع مشابه
Checkpointing Speculative Distributed Shared Memory
This paper describes a checkpointing mechanism destined for Distributed Shared Memory (DSM) systems with speculative prefetching. Speculation is a general technique involving prediction of the future of a computation, namely accesses to shared objects unavailable on the accessing node (read faults). Thanks to such predictions objects can be fetched before the actual access operation is performe...
متن کاملSpeculation Meets Checkpointing
This paper describes a checkpointing mechanism destined for Distributed Shared Memory (DSM) systems with speculative prefetching. Speculation is a general technique involving prediction of the future of a computation, namely accesses to shared objects unavailable on the accessing node (read faults). Thanks to such predictions objects can be fetched before the actual access operation is performe...
متن کاملUsing Speculative Push for Unnecessary Checkpoint Creation Avoidance
This paper discusses a way of incorporating speculation techniques into Distributed Shared Memory (DSM) systems with checkpointing mechanism without creating unnecessary checkpoints. Speculation is a general technique involving prediction of the future of a computation, namely accesses to shared objects unavailable on the accessing node (read faults). Thanks to such predictions objects can be p...
متن کاملPutting checkpoints to work in thread level speculative execution
With the advent of Chip Multi Processors (CMPs), improving performance relies on the programmers/compilers to expose thread level parallelism to the underlying hardware. Unfortunately, this is a difficult and error-prone process for the programmers, while state of the art compiler techniques are unable to provide significant benefits for many classes of applications. An interesting alternative ...
متن کاملAn Attributed, Time-Delayed Rendezvous Model for Parallel Discrete Event Simulation
This report extends the model for parallel discrete event simulation presented in [9– 11] by introducing the notion attributes (messages). The model is based on the notions of processes and gates and on the rendezvous mechanism defined in the the basic Lotos process algebra[2]. Time is introduced via a mechanism similar to the delay behaviour annotation provided by the Topo toolset[4–6]. Commun...
متن کامل